Skip to content

Conversation

@bigximik
Copy link
Contributor

✨ Description

Implements the dense Qwen2 checkpoint converter according to
Hugging Face Transformers Qwen2.

The use_sliding_window, sliding_window, and max_window_layers parameters from the HF Qwen2 configuration are ignored during conversion, as they are not part of the architecture parameters. This is the same way the sliding_window parameter is handled in the Mistral checkpoint converter.

part of the #135

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

  1. Implements dense Qwen2 converter

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

  • 📜 I have read and followed the contributing guidelines.
  • 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
  • 🎉 The functionality is complete, and I have tested the changes.
  • 📝 I have updated the documentation if needed. (not applicable)
  • ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
  • 🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

  • 🐋 I have updated the Docker configuration or dependencies, if applicable. (not applicable)
  • 🔄 I have ensured compatibility with the existing setup after dependency changes. (not applicable)

Testing

  • 🧪 I have added or updated tests to cover my changes.
  • ✔️ New and existing tests pass locally with my changes. (tested affected llama starcder2 and qwen2 conversion)
  • 🚦 I have tested these changes on GPUs and verified training stability.
  • 🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact (not applicable)

Copy link
Collaborator

@tscholak tscholak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @bigximik!
I think next in line is #166, wdyt?



@dataclasses.dataclass
class IgnoreImportQwen2SlidingWindowParamsConverter(ParamConverter):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bigximik this is fine, but can you please add a todo here that says that this is a temporary hack until we can load these params from the config?

@tscholak tscholak merged commit 23006dc into main Mar 4, 2025
4 checks passed
@tscholak tscholak deleted the qwen2 branch March 4, 2025 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants